Method for the performance testing of echo cancellers using an artificial segmented test signal

ABSTRACT

A method for the performance testing of an echo canceller including the steps of transmitting a test signal to the echo canceller, recording an echo energy signal output from the echo canceller as a result of the test signal, measuring the energy and duration of the echo energy signal and calculating a performance score for the echo canceller based on the measured energy and duration of the echo energy signal, wherein the test signal transmitted to the echo canceller has a plurality of discrete frequency band segments of white noise representing sub-bands of an overall bandwidth ranging from about 0 kHz to about 3.5 kHz. The test signal preferably has four discrete frequency band segments of white noise, wherein each discrete frequency band segment of the test signal is repeated in succession, whereby the test signal has a total of eight segments. Thus, the test signal can have a first frequency band segment having a bandwidth of about 0 kHz to about 1 kHz, a second frequency band segment having a bandwidth of about 1 kHz to about 2 kHz, a third frequency band segment having a bandwidth of about 2 kHz to about 3 kHz and a fourth frequency band segment having a bandwidth of about 3 kHz to about 3.5 kHz.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. Provisional Application Serial No. 60/470,666, filed May 15, 2003, the disclosure of which is incorporated herein by reference.

FIELD OF THE INVENTION

[0002] The present invention relates generally to communication systems and, more particularly, to a method for the objective performance testing of an echo canceller using a new artificial segmented test signal and echo measurement algorithm.

BACKGROUND OF THE INVENTION

[0003] All voice network service providers who want to introduce packet transport technology or any other technology (e.g., digital cellular) that increases connection delay beyond approximately 25 msec (round trip) need to deploy echo cancellers within their networks and these devices need to perform their function across a wide range of usage variation. The basic objective of a well-performing echo canceller (EC) is to quickly recognize an echo and then rapidly “converge” to cancel the echo by reducing it in level below the threshold of notice. Accordingly, it is very important to test an echo canceller (EC) prior to deployment in the network to ensure that it will perform to expectations.

[0004] A challenge in the area of echo canceller (EC) performance testing concerns what test signal(s) to use. Today, the standard EC test is based on International Telecommunication Union-Standardization Sector (ITU-T) Recommendations G.165 and G.168. ITU-T Recommendation G.165 recommended the use of band-limited (300-3400 Hz) white noise as the test signal. The updated, G.168 Recommendation defined and recommended a new test signal, the composite source signal (CSS) and the pass/fail criteria is based on results generated with this signal.

[0005] The CSS is a speech like signal in that it has a power density spectrum similar to that of speech, it is interrupted by gaps that simulate the pauses found in speech, and it simulates both voiced and unvoiced sounds. Like all good test signals, the CSS can also be specifically defined and, thus, constructed with fidelity in the different test labs. The CSS is generally regarded as a superior test signal to the white noise test signal that it replaces and performance seen with the CSS is assumed to be predictive of how the EC will perform in the presence of speech. However, there is ample evidence that shows that this prediction too often does not hold.

[0006] The existing “artificial” test signals used in the telecommunications industry for evaluating the performance of an echo canceller (EC) (i.e., those recommended by the ITU-T in Recommendations G.165 and G.168) frequently fail to predict how the echo canceller will perform in “real” uses (i.e., when the EC is working on actual speech signals). Specifically, testing an echo canceller with a CSS signal is not always predictive of its performance in the presence of speech. Consequently, pre-deployment performance testing of new EC designs does not provide the level of confidence desired in that the performance seen in the lab is not always the performance experienced in the field.

[0007] Since echo cancellers are deployed to control the echo of speech signals, the test signal of choice would seem to be a speech signal(s). It has been proposed to use a speech signal as the test signal, but there has been no agreement in the industry as to which speech signal to use. Speech signals vary widely from person to person and EC performance is sensitive to this variation.

[0008] Attempts to substitute speech signals in the G.168 testing reveal substantial performance variation driven by the specific speech sample in use. So it would appear that a sample of speech signals needs to be identified and used. This conclusion is not a satisfying one. G.168 EC testing is complicated enough with a single test signal in use. Trying to get general agreement to move to a sample of speech signals for testing is likely to be difficult if not impossible. Furthermore, to get lab-to-lab conformity, these signals would have to be shared since their independent reproduction is also not likely due to the complexity of real speech signals.

[0009] One approach to handling this complexity is to use multiple speech signals in an EC performance test and to score EC performance in terms of the results of a subjective mean-opinion-sore (MOS) test, where groups of test subjects listen to the processed speech samples and rate them in terms of their transmission quality based on the residual echo seen with the speech samples. The problem with this approach is that it does not provide the single, objective test signal needed to do routine testing and for setting objective performance requirements. Also, the MOS technique is rather expensive to implement and, under ideal conditions, can take days to conduct.

[0010] Accordingly, there is a need for an improved “objective” test signal for use in echo canceller performance evaluations. The currently used signals of band limited white noise and the CSS both have general utility but neither represent speech as well as they should. Thus, it would be desirable to provide an artificial test signal which generates performance that correlates highly with that observed with actual speech signals.

SUMMARY OF THE INVENTION

[0011] The present invention is a method for the performance testing of an echo canceller. The method generally includes the steps of transmitting a test signal to the echo canceller, recording an echo energy signal output from the echo canceller as a result of the test signal, measuring the energy and duration of the echo energy signal and calculating a performance score for the echo canceller based on the measured energy and duration of the echo energy signal, wherein the test signal transmitted to the echo canceller has a plurality of discrete frequency band segments of white noise.

[0012] In a preferred embodiment, the performance score is calculated by multiplying the measured energy and the measured duration of the echo energy signal and a pass/fail criterion can be developed based on the calculated performance score as compared to mean opinion score results of multiple speech sample testing. Also, the recorded echo energy signal preferably has a duration greater than about 5 msec and a level greater than about −50 dBm. Preferably, the steps of recording and measuring are repeated for each of a plurality of echo energy signals, wherein the performance score is calculated by summing the products of the measured energy multiplied by the measured duration for each of the echo energy signals.

[0013] The test signal of the present invention includes discrete frequency band segments representing sub-bands of an overall bandwidth ranging from about 0 kHz to about 3.5 kHz. In a preferred embodiment, the test signal has four discrete frequency band segments of white noise, wherein each discrete frequency band segment of the test signal is repeated in succession, whereby the test signal has a total of eight segments. Preferably, the test signal has a first frequency band segment representing white noise having a bandwidth of about 0 kHz to about 1 kHz, a second frequency band segment representing white noise having a bandwidth of about 1 kHz to about 2 kHz, a third frequency band segment representing white noise having a bandwidth of about 2 kHz to about 3 kHz and a fourth frequency band segment representing white noise having a bandwidth of about 3 kHz to about 3.5 kHz.

[0014] The test signal of the present invention further preferably has a plurality of discrete frequency band segments which are ordered within the test signal by increasing frequency. The plurality of discrete frequency band segments also preferably have an equal level of about −18 dBm and an equal duration of about 350 msec. The plurality of discrete frequency band segments of the test signal are further preferably separated from one another by a delay of about 150 msec and the level during the delay is about −65 dBm.

[0015] Other objects and features of the present invention will become apparent from the following detailed description considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed as an illustration only and not as a definition of the limits of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016]FIG. 1 is a graphical representation of the segmented test signal formed in accordance with the present invention.

[0017]FIGS. 2 and 3 show some convergence speed results for two different echo cancellers in response to the 4-band signal formed in accordance with the present invention.

[0018]FIG. 4 shows the MOS result for four ECs tested with the test signal of the present invention at 5 ERL levels.

[0019]FIG. 5 plots the tested relationship between the Erep calculated on the recordings made with the speech signals and the MOS results across the four ECs of FIG. 4.

[0020]FIG. 6 plots the tested Erep for the 4-band signal as a function of both EC and ERL for the four ECs of FIGS. 4 and 5.

[0021]FIGS. 7a, 7 b and 7 c plot MOS against the tested Erep for each of the 4-band, white noise and CSS signals.

[0022]FIG. 8 shows the correlation between the 4-band test signal results and the speech samples' Erep score.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0023] Frequency variation is a major source of the between speech sample variation actually observed, especially across speaker gender and age. It has been found that to address the need for a more representative objective test signal, a test signal that combines the attributes of being objective (i.e., being definable at a level that allows others to recreate it with fidelity), and being representative of speech signals (i.e., that it generates convergence speed performance results that correlate very well with MOS results obtained with actual speech signals) is most desired.

[0024]FIG. 1 shows the new artificial test signal 10, for the objective performance testing of an echo canceller, formed in accordance with the present invention. The test signal 10 is a segmented white noise type signal including four (4) separate frequency bands 12 a, 12 b, 12 c and 12 d of white noise. The basic segments 12 a, 12 b, 12 c and 12 d represent sub-bands of an overall bandwidth ranging from 0 to 3.5 kHz. Preferably, each fundamental segment or band 12 a, 12 b, 12 c and 12 d appears twice in succession, thus resulting in the eight (8) segment signal shown in FIG. 1. The four fundamental segments 12 a, 12 b, 12 c and 12 d represent white noise having respectively different bandwidths: 0-1 kHz 12 a, 1-2 kHz 12 b, 2-3 kHz 12 c, and 3-3.5 kHz 12 d. In the overall test signal 10, these segments are ordered in terms of increasing frequency as shown in FIG. 1. All segments 12 a, 12 b, 12 c and 12 d are preferably equal in level at about −18 dBm and are preferably equal in duration for about 350 msec. Also, the segments 12 a, 12 b, 12 c and 12 d are preferably separated from one another by a delay or gap 14 of about 150 msec. It has been found that this sound level approximates the average speech level seen in the public switched telephone network (PSTN) and the gaps 14 of about 150 msec approximate the pauses seen in actual speech patterns. Additionally, the noise floor, or level during the gaps 14, is preferably about −65 dBm.

[0025]FIGS. 2 and 3 show some convergence speed results for two different echo cancellers (ECs) in response to the 4-band signal 10 formed in accordance with the present invention. The segmented test signal 10 passing though the EC in the send direction FIG. 168 term S_(in)) and NO (G.168 term R_(out)) EC ports is shown, as is the echo spikes leaking through the EC in the return direction FO port (G.168 term S_(out)). For the EC whose performance is shown in FIG. 2, there is no apparent initial echo at the lowest frequency (0-1 kHz) segment 12 a. In both the second band 12 b (1-2 kHz) and the third band 12 c (2-3 kHz) a brief spike of echo is detected during the first replication, but full convergence (no echo) is detected on the second replication. In the fourth band 12 d (3-3.5 kHz) a brief spike of echo is detected in the first replication and an even briefer, and attenuated, spike is detected in the second replication. The pre-convergence seen in the lowest frequency segment 12 a is possibly a result of that EC declaring that low frequency as noise.

[0026] The second EC shows a different convergence pattern as illustrated in FIG. 3. At the beginning of the test signal 10, initial echo is detected in the lowest frequency band 12 a that is captured (converged on) during the second replication. This same pattern is seen in the 1-2 kHz band 12 b. Basically, no echo is seen in the next bands 12 c and 12 d. Given the frequency-time confounding inherent in the test signal 10, it is unclear whether the particular design of this EC allows adaptive filter control gained at lower frequencies to cause the NLP to function immediately for the higher frequency energy or whether this EC is pre-converged for high frequency energy.

[0027] Nevertheless, these results show that the test signal 10 according to the present invention reveals convergence performance both within and between frequency bands. This is the type of sensitivity that is most desired in an echo canceller test signal.

[0028] To expand the utility of the new test signal 10, a new method for testing the performance of an echo canceller is also provided. The first requirement of any useful EC convergence performance measure is that it link to the customers' experience. The psychophysics of human hearing indicates that, in sound detection, energy over time is integrated within a time frame of roughly 200 ms and, in echo perception, both echo duration and intensity is factored. Accordingly, for accurate echo canceling performance testing, it would be desired to calculate both the echo spike energy and the energy spike duration and integrate these values into one performance number.

[0029] The method, according to the present invention, utilizes an associated algorithm that examines the convergence period energy escaping from an echo canceller's S_(out) port and calculates a single statistic to represent this echo (Erep). Briefly, the new algorithm analyzes the residual echo energy (both power and duration) observed during performance testing and represents this echo energy in a single score that can be used to represent how well the echo canceller will perform in the presence of speech. Pass/fail criterion can then be developed based on correlation studies (measured echo to mean opinion score (MOS) results based on testing with multiple speech samples) that can serve as a new and more accurate requirement on echo canceller convergence speed.

[0030] For purposes of the present invention, an echo spike is defined as a burst of energy having a duration in excess of 5 ms and a level greater than −50 dBm. The −50 dBm threshold is based on much evidence that brief echo at and below −50 dBm has little negative subjective effects. Thus, one skilled in the art will recognize that this threshold is chosen empirically so as to maximize the predictive validity of the algorithm and that other thresholds may be used.

[0031] For each echo spike that exceeds the 5 ms/−50 dBm threshold, the algorithm multiplies the duration and energy to get a single score. Then, the algorithm sums the scores of such echo spikes to obtain an overall test score. The algorithm is expressed in the following equation:

Erep=a Σt _(i) p _(i) ;t _(i)>=5 ms;

[0032] where “t_(i)” is the ith echo spike duration (ms), “p_(i)” is the ith echo spike's average power level (dBm), “a” is an arbitrary scaling factor and “Erep” is the final score for the echo signal file. (A suitable scaling factor “a” for purposes of the present invention is 0.01.) Thus, the method according to the present invention includes the steps of calculating the power and the duration of each spike within a residual echo energy and then integrating the calculations over all the spikes seen to yield a single objective score. This algorithm may be advantageously implemented on programmable test equipment using a programmable computer, discrete digital circuits or application specific integrated circuits (ASICs).

[0033] Verification tests of the present invention show that the objective score achieved with the new test signal and algorithm is highly correlated (0.89) with the MOS results. Thus, the test result achieved with the method of the present invention is highly predictive of how well an EC will perform when acting on actual speech signals.

EXAMPLE

[0034] To evaluate the new 4-band test signal and method according to the present invention, the following objective tests and comparisons to MOS test results were conducted.

[0035] Four different echo cancellers (ECs) were tested using the test signal and method of the present invention: EC1; EC2; EC3; and EC4. In addition, three different types of objective test signals were used: 1) the 4-band test signal of the present invention; 2) the CSS and the G.165 white noise signal; and 3) 8 different speech signals, each representing a different speaker (4 female, 4 male) speaking a unique pair of short sentences. Each test signal was processed through each EC at 5 different echo return loss (ERL) levels: 6, 8, 10, 12 and 14 dB. During this processing, the energy appearing at the EC's S_(out) port (i.e., the echo energy not captured by the EC) was recorded. The play/record process was done by a computer system equipped with a special dual-T1 board. Any computer equipped with the necessary hardware interface and software can be used to accomplish the play/record. The computer was connected to the test EC, either directly or through a PBX. The recorded echo samples were each processed via the method according to the present invention to generate an Erep score.

[0036] To prepare the speech-based recordings for use in the subsequent MOS test, 180 ms of “delay” was added to the front of the recorded echo samples obtained when the 8 speech signals were in use to simulate the round trip delay of a digital cellular connection and then these were mixed with the source samples. Where significant echo energy is present in the recordings, the delay and mixing strategy colors the source samples with the echo. These mixed files were subsequently rated for quality (by 33 subjects) within an MOS test.

[0037] The criterion chosen for best objective signal is the one whose calculated Erep best correlates with the MOS results obtained in response to the speech samples that were processed. To be a generally useful test signal the absolute correlation obtained needs to be high.

[0038]FIG. 4 shows the MOS result for each of the four ECs at each of the 5 ERL levels. The error bars are standard deviation. From the graph it can be seen that ECs 1, 2 and 4 exhibit the expected positive relationship between MOS and ERL. For EC3, the MOS is flat across the ERL range. The difference among ECs is likely due to the complex result of different adaptive filter designs working in conjunction with different NLP thresholds. That said, it is expected that all ECs will show full operation (convergence) at and above some ERL level.

[0039]FIG. 5 plots the relationship between the Erep calculated on the recordings made with the speech signals and the MOS results across the four ECs. Each data point is the mean of 8 speaker files. The obtained correlation of −0.97 is very high, although it is somewhat expected since both the calculated Erep and MOS data are based on the same recordings.

[0040]FIG. 6 plots the Erep for the 4-band signal as a function of both EC and ERL. The general shape is in good agreement with the MOS results in FIG. 4, i.e., the Erep for EC3 is little affected by the ERL manipulation whereas a greater ERL effect is seen for the other 3 ECs in the sample.

[0041]FIGS. 7a, 7 b and 7 c plot MOS (collapsed over the EC variable) against Erep for each of the 4-band, white noise and CSS signals. The 4-band signal obviously correlates best with the MOS score. It can be seen that the MOS score stays above 4 when the Erep is below 100. That suggests for the current 4-band signal level that if the score is below 100 the echo is not noticeable. The data presented in FIG. 5, based on the Erep for speech signals, would lead to a similar conclusion for the test signal. The overall data suggested that a pass/fail criterion of Erep<100 might be too stringent. Thus, taking the ERL variable into consideration, the criterion can be relaxed slightly whereby Erep<200 for ERLs>10 can be adopted, since ERLs below 10 dB are rarely encountered in the field.

[0042] Another relationship of interest is that between the 4-band signal's Erep and the speech sample's Erep. As shown in FIG. 8, the 4-band signal results correlate very high with the speech sample results. This lends further support for the utility of the 4-band signal of the present invention.

[0043] As a result of the present invention, a better objective test signal and an associated measurement method are provided. By using a more accurate test signal, a network service provider will be able to reduce its costs of conducting pre-deployment echo canceller testing. Although the focus here was on echo canceller convergence speed, the 4-band test signal and measurement algorithm should be useful in testing other echo canceller performance areas since the underlying factor, residual echo, is constant.

[0044] While there has been described what is presently believed to be the preferred embodiments of the invention, those skilled in the art will realize that various changes and modifications may be made to the invention without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as fall within the scope of the invention. 

What is claimed is:
 1. A test signal for the performance testing of an echo canceller, the test signal comprising a plurality of discrete frequency band segments of white noise.
 2. A test signal as defined in claim 1, wherein said discrete frequency band segments represent sub-bands of an overall bandwidth ranging from about 0 kHz to about 3.5 kHz.
 3. A test signal as defined in claim 2, comprising four discrete frequency band segments of white noise.
 4. A test signal as defined in claim 3, wherein each discrete frequency band segment is repeated in succession, whereby said test signal comprises a total of eight segments.
 5. A test signal as defined in claim 3, comprising: a first frequency band segment representing white noise having a bandwidth of about 0 kHz to about 1 kHz; a second frequency band segment representing white noise having a bandwidth of about 1 kHz to about 2 kHz; a third frequency band segment representing white noise having a bandwidth of about 2 kHz to about 3 kHz; and a fourth frequency band segment representing white noise having a bandwidth of about 3 kHz to about 3.5 kHz.
 6. A test signal as defined in claim 1, wherein said plurality of discrete frequency band segments are ordered within said test signal by increasing frequency.
 7. A test signal as defined in claim 1, wherein said plurality of discrete frequency band segments have an equal level of about −18 dBm.
 8. A test signal as defined in claim 1, wherein said plurality of discrete frequency band segments have an equal duration of about 350 msec.
 9. A test signal as defined in claim 1, wherein said plurality of discrete frequency band segments are separated from one another by a delay of about 150 msec.
 10. A test signal as defined in claim 9, wherein said test signal has a level during said delay of about −65 dBm.
 11. A method for the performance testing of an echo canceller comprising the steps of: transmitting a test signal to said echo canceller, said test signal having a plurality of discrete frequency band segments of white noise; recording an echo energy signal output from said echo canceller as a result of said test signal; measuring the energy and duration of said echo energy signal; and calculating a performance score for said echo canceller based on said measured energy and duration of said echo energy signal.
 12. A method as defined in claim 11, wherein said performance score is calculated by multiplying said measured energy and said measured duration of said echo energy signal.
 13. A method as defined in claim 11, further comprising the step of developing a pass/fail criterion based on said calculated performance score as compared to mean opinion score results of multiple speech sample testing.
 14. A method as defined in claim 11, wherein said recorded echo energy signal has a duration greater than about 5 msec and a level greater than about −50 dBm.
 15. A method as defined in claim 11, further comprising the step of repeating said recording and said measuring steps for each of a plurality of echo energy signals, wherein said performance score is calculated by summing the products of said measured energy multiplied by said measured duration for each of said echo energy signals.
 16. A method as defined in claim 11, wherein said discrete frequency band segments of said test signal represent sub-bands of an overall bandwidth ranging from about 0 kHz to about 3.5 kHz.
 17. A method as defined in claim 16, wherein said test signal comprises four discrete frequency band segments of white noise.
 18. A method as defined in claim 17, wherein each discrete frequency band segment of said test signal is repeated in succession, whereby said test signal comprises a total of eight segments.
 19. A method as defined in claim 17, wherein said test signal comprises: a first frequency band segment representing white noise having a bandwidth of about 0 kHz to about 1 kHz; a second frequency band segment representing white noise having a bandwidth of about 1 kHz to about 2 kHz; a third frequency band segment representing white noise having a bandwidth of about 2 kHz to about 3 kHz; and a fourth frequency band segment representing white noise having a bandwidth of about 3 kHz to about 3.5 kHz.
 20. A method as defined in claim 11, wherein said plurality of discrete frequency band segments of said test signal are ordered within said test signal by increasing frequency.
 21. A method as defined in claim 11, wherein said plurality of discrete frequency band segments of said test signal have an equal level of about −18 dBm.
 22. A method as defined in claim 11, wherein said plurality of discrete frequency band segments of said test signal have an equal duration of about 350 msec.
 23. A method as defined in claim 11, wherein said plurality of discrete frequency band segments of said test signal are separated from one another by a delay of about 150 msec.
 24. A method as defined in claim 23, wherein said test signal has a level during said delay of about −65 dBm. 